Devanagari OCR using a recognition driven segmentation framework and stochastic language models
Identifieur interne : 000A85 ( Main/Exploration ); précédent : 000A84; suivant : 000A86Devanagari OCR using a recognition driven segmentation framework and stochastic language models
Auteurs : Suryaprakash Kompalli [États-Unis] ; Srirangaraj Setlur [États-Unis] ; Venugopal Govindaraju [États-Unis]Source :
- International journal on document analysis and recognition : (Print) [ 1433-2833 ] ; 2009.
Descripteurs français
- Pascal (Inist)
- Reconnaissance caractère, Reconnaissance optique caractère, Concordance forme, Classification, Mot, Langage naturel, Automate stochastique, Automate fini, Machine état fini, Linguistique, Reconnaissance forme, Traitement image, Segmentation, Approche probabiliste, Modélisation, Méthode graphe, Théorie graphe, ., Appariement image, Modèle n gramme.
- Wicri :
- topic : Classification, Linguistique.
English descriptors
- KwdEn :
- Character recognition, Classification, Finite automaton, Finite state machine, Graph method, Graph theory, Image matching, Image processing, Linguistics, Modeling, N gram model, Natural language, Optical character recognition, Pattern matching, Pattern recognition, Probabilistic approach, Segmentation, Stochastic automaton, Word.
Abstract
This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.
Affiliations:
- États-Unis
- État de New York
- Buffalo (New York)
- Université d'État de New York, Université d'État de New York à Buffalo
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000194
- to stream PascalFrancis, to step Curation: 000583
- to stream PascalFrancis, to step Checkpoint: 000201
- to stream Main, to step Merge: 000A95
- to stream Main, to step Curation: 000A85
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Devanagari OCR using a recognition driven segmentation framework and stochastic language models</title>
<author><name sortKey="Kompalli, Suryaprakash" sort="Kompalli, Suryaprakash" uniqKey="Kompalli S" first="Suryaprakash" last="Kompalli">Suryaprakash Kompalli</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="university" n="3">Université d'État de New York à Buffalo</orgName>
<orgName type="institution">Université d'État de New York</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">10-0180818</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 10-0180818 INIST</idno>
<idno type="RBID">Pascal:10-0180818</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000194</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000583</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000201</idno>
<idno type="wicri:doubleKey">1433-2833:2009:Kompalli S:devanagari:ocr:using</idno>
<idno type="wicri:Area/Main/Merge">000A95</idno>
<idno type="wicri:Area/Main/Curation">000A85</idno>
<idno type="wicri:Area/Main/Exploration">000A85</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Devanagari OCR using a recognition driven segmentation framework and stochastic language models</title>
<author><name sortKey="Kompalli, Suryaprakash" sort="Kompalli, Suryaprakash" uniqKey="Kompalli S" first="Suryaprakash" last="Kompalli">Suryaprakash Kompalli</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="university" n="3">Université d'État de New York à Buffalo</orgName>
<orgName type="institution">Université d'État de New York</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Classification</term>
<term>Finite automaton</term>
<term>Finite state machine</term>
<term>Graph method</term>
<term>Graph theory</term>
<term>Image matching</term>
<term>Image processing</term>
<term>Linguistics</term>
<term>Modeling</term>
<term>N gram model</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Pattern matching</term>
<term>Pattern recognition</term>
<term>Probabilistic approach</term>
<term>Segmentation</term>
<term>Stochastic automaton</term>
<term>Word</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Concordance forme</term>
<term>Classification</term>
<term>Mot</term>
<term>Langage naturel</term>
<term>Automate stochastique</term>
<term>Automate fini</term>
<term>Machine état fini</term>
<term>Linguistique</term>
<term>Reconnaissance forme</term>
<term>Traitement image</term>
<term>Segmentation</term>
<term>Approche probabiliste</term>
<term>Modélisation</term>
<term>Méthode graphe</term>
<term>Théorie graphe</term>
<term>.</term>
<term>Appariement image</term>
<term>Modèle n gramme</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Classification</term>
<term>Linguistique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>État de New York</li>
</region>
<settlement><li>Buffalo (New York)</li>
</settlement>
<orgName><li>Université d'État de New York</li>
<li>Université d'État de New York à Buffalo</li>
</orgName>
</list>
<tree><country name="États-Unis"><region name="État de New York"><name sortKey="Kompalli, Suryaprakash" sort="Kompalli, Suryaprakash" uniqKey="Kompalli S" first="Suryaprakash" last="Kompalli">Suryaprakash Kompalli</name>
</region>
<name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A85 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A85 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:10-0180818 |texte= Devanagari OCR using a recognition driven segmentation framework and stochastic language models }}
This area was generated with Dilib version V0.6.32. |